Downloaded from rsta.royalsocietypublishing.org on July 1,2012 


philosophical THE ROYAL 

TRANSACTIONS cA/nirTv 
-of- jULlt 1 Y 


A 


MATHEMATICAL, 
PHYSICAL 
& ENGINEERING 
SCIENCES 


Mathematical Contributions to the Theory of Evolution. X. 
Supplement to a Memoir on Skew Variation 

Karl Pearson 

Phil. Trans. R. Soc. Lond. A 1901 197, 443-459 
doi: 10.1098/rsta. 1901.0023 


Email alerting service 


Receive free email alerts when new articles cite this article - sign up in the box at the top right-hand 
corner of the article or click here 


To subscribe to Phil. Trans. R. Soc. Lond. A go to: http://rsta.royalsocietypublishing.org/subscriptions 








Downloaded from rsta.royalsocietypublishing.org on July 1,2012 


I 443 ] 


XI. Mathematical Contributions to the Theory of Evolution. — X. Supplement 

to a Memoir on Skew Variation.* 

By Karl Pearson, F.R.S., University College, London. 


Received May 22,—Read, June 20, 1901. 

(1.) In a memoir on Skew Variation published in the c Phil. Trans./ A, vol. 186, 
1895, a series of frequency curves are discussed which are integrals of the differential 
equation 

1 dy ‘—x 

y dx e l -f c 2 .v -f u 6 x 2 

(See p. 381 of the memoir.) 

The discussion of four main types is given in detail, and a brief reference is made 
to various sub-types which may occur. The types considered in that memoir covered 
at the time all the frequency series, and they were fairly numerous, that I had had 
occasion to deal with. In the course of the last few years, however, I have been 
somewhat puzzled by frequency distributions for which the criterion 2/3 s — 3/3j — 6 
(see p. 378) was positive, and therefore & priori a curve of the type 



was to be expected, but which on calculation gave v imaginary. The frequency 
distributions in question arosef occasionally in sociological statistics, but also in 

* ‘Phil. Trans./ A, vol. 186, p. 343. 

t Some other frequency distributions, which on first investigation fell under Types V. and VI. of the 
present paper, were found with improved values for the moments to fall under types already discussed. 
Mr. W. F. Sheppard’s values for the moments (‘ Lond. Math. Soc./ vol. 29, p. 369, formula 30) should 
certainly be used in preference to those given by me (‘Phil. Trans./ A, vol. 186, p. 350) whenever we are 
calculating the moments of a curve from areas and not from true ordinates. I hope shortly to publish a 
paper on this point, which is one really of quadrature formulae. Meanwhile for every true frequency 
curve with high contact at hath terminal* we ought to use 

l l 2 - pT - Vi' 2 - iV) 

/ X 4 — C 4 (1/4 — 4 ^! I';/ + 6l//'*W — 3 f/ 4 ~ | (v-2 ~ r/ 2 ) + 2 Ta)j 

instead of the values given on p. 350, remaining unchanged. 

(297) 3 L 2 29.11.1901 
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444 PEOFESSOE K. PEAESON ON MATHEMATICAL 

biological investigations. It seemed, therefore, desirable to enter a little more 
fully into the analysis of the cases in which the criterion was positive but v 
imaginary, and discover what types of frequency curves had escaped my attention.* 
The key to the solution lies in the fact noted on p. 369 of the memoir, namely, that 
even if the criterion be positive, there will still be a solution akin to Type I. and not 
to Type IV. if e be negative. No frequency series satisfying these conditions had at 
that time come under my notice, and later, when collecting data of floral variability, 
my own remark as to e had slipped from my memory. It is the object of this 
supplement to obtain an improved criterion of type, to discuss the nature of the 
curves which fill the gap observed, and to illustrate by one or two examples the 
fitting of such curves to actual statistics. 


(2.) 'The Two Criteria. 

Throughout this supplement the notation of the previous memoir will be assumed 
to be familiar to the reader. 

Turning to p. 378 of that memoir, we note that since and r — 1 are necessarily 
positive, % if positive must be > r % . Hence v can only become imaginary if z be 
negative, or 

16 ( r - 1) 

Substitute in this the value of r and it becomes 


/W+3) 3 


4(4&-3/9 1 )(2/8 8 -3/8 1 - 6) 


> 1 


(ii.). 


Hence the complete condition that a curve of Type IV, shall give the distribution 
of frequency is not only 


but also 


k, = 


k x = 2/3 2 - 3j3 l - 6 > 0, 
M&+ 3)» 


4 (4/3 3 — 3/5j) (2/5 3 — 3/9, - 6) 


< 1 . 


Turning back to p. 369, we see that € being positive the complete conditions for a 
curve of Type I, giving the distribution of frequency are 


2 j 8 s _ 3ft - 6 < 0 , 


* I was very loath to adopt Professor Edgeworth’s method of inventing new frequency curves by 
putting x — f(x') in a normal frequency distribution, y — y^e~ cx% , Besides strong theoretical objections to 
this process, I had found Equation (i.) so sufficient for a great variety of cases that I felt confident it must 
cover the newly discovered outstanding cases, and this confidence seems justified by the result. 
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and 


„ = _A (A + 3) 2 „ 

2 4 (4ft-3ft) (2ft-3ft-6) 


< 0 . 


The latter condition will be always satisfied since ft and 4ft — 3ft are positive for 
any distribution whatever, and 2ft — 3ft — 6 is negative by hypothesis. 

Further, in the previous case #c 2 is seen to be essentially positive. 

Hence the criteria written down cover all possible cases but those for which 


k 3 > 1. 

Sub-cases which arise from transition curves just at the limits will, however, be 
likely to he of interest, What happens when k„ = oo and when k 3 = 1 ? The only 
possibility for k 2 = oo is 2ft — 3/3, — 6, or k } = 0. But this curve has been fully 
treated under Type III. in the memoir. 

We shall see later that k 3 = 1 leads us up to a novel transition curve of consider¬ 
able interest. 

To ascertain something about the general case in which k 2 > 1, let us return to 
the memoir again and examine the value of e on p. 369. It can only be negative if 

4 + i ft ( r + 2)7 (r + 1) be < 0, 


where r is here 


6(ft-ft -1) 

3ft-2ft + 6 ' 


Substituting, we find at once 


K i > 1 , 


which in itself involves k, > 0. 

Hence the missing gap corresponds to those cases in which e is negative. 

It will be clear that k 2 , although in form giving a more complex criterion than k x , 
is really more effective, as covering all the possible cases. We have then the 
following scheme :— 


Criterion k 2 . 

Corresponding frequency curve. 

Ko = GO. 

K-2 > 1 & < 00 

*2=1. 

k 2 > 0 & < 1 . . . . . 

K 2 = 0, /?! == 0, f3 2 = 3. . 
k 2 = 0, fii — 0, f$ 2 not — 3 
*2 < 0 . 

Transition curve, Type III. (Memoir, p. 373). 

Type YI. (see p. 448 below). 

Transition curve, Type Y. (see p. 446 below). 

Type IY. (Memoir, p. 376). 

Normal curve. 

Type II. (Memoir, p. 372). 

Type I. (Memoir, p 367). 
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The object of this supplement is to discuss the calculation of curves of Type V., 
and to consider those of Type VI. somewhat more at length, they being only briefly 
referred to on p. 369 of the memoir. It will be seen that Type I. of the memoir has 
now broken up into two divisions. One portion is the old Type I. passing into the 
normal curve on one side and Type III. on the other. This Type III. separates the 
second portion, Type VI., of the old Type I. from the first portion. Type VI. passes 
from Type III. to the new transition curve Type V., which, like Type III., will be 
found to have a range limited in one direction only. Finally this new Type V. is the 
transition to the old Type IV. bounded on the other side by the sub-curve, the old 
Type II., and beyond that the normal curve. Thus we see that Types I. and IV. do 
not pass directly into each other through Type III., as might be supposed by the 
criterion k, > or < 0, but that there are a series of intervening curves, two of which, 
Types V. and VI., require further consideration, if we are to complete the whole 
round of frequency distributions embraced under the differential equation (i.). 

(3.) On the Frequency Curve of Type V. 

Returning to the fundamental differential equation (i.), let us consider what 
transformation takes place when the denominator on the right has equal roots.* We 
may then write it in the form 

1 ill/ _ — x _o, 1 1 

y clx c 0 (cj + ,r ) 2 — c 0 f + xf c 0 (c, + x) ' 

Hence log y — - ^ o (c~+F) “ lo & ( c > + x ) + const 

_ Y 

Thus y — y 0 e Cl+x (c x + x)~p, 

where, y 0 is a constant, y — and p — 1 /c 0 . Thus changing the origin we may 
write the curve : 

y ~ Vo x ~ r e ~ yh . 

where x m0 — y/p gives the distance of the mode from the new origin. 

To find the moments about this origin, we notice that, p and y being positive, 
y — 0 when x — 0 and when x = oo. Thus as in the curve of Type III. we have a 
range‘limited at one end only. 

To find the moments we have, if a be the area, 

a = \ y 0 x~P + ''e-y*dx . ... ■ .(iv\), 

* I owe to Miss Agnes Kelly, Ph.D., the suggestion that this type of frequency curve deserved fuller 
treatment. 
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Put yjx = z, and we find 

f co 

e~ z dz 


J 0 

— y t) y' l ~ p+1 r (p —n —l).(v.). 

Thus: a. — y 0 r (p — 1).(vi). 

Pi =-y/{p- 2) ] 

p* ~ yV ip - 2 )(p - 3 ) i , .. v 


Pi = 7'V (P - 2 ) (p •“ 3) (p - 4) 

IH = yV (P - 2) (p - 3) (p - 4) (p - 5) 


Transferring to the centroid we find 

rp N 

^ = (i»-2)»(p-3) 

_V_ 

^ (p - V s (p - 3) (p - 4) 

__3 (p + 4) y 4 _ 

^ ~ (P - 2 ) 4 0 - 3) (jj - 4)(p - 5) > 


A = PaY/V = 


16 Q - 3) 
— 4) 2 


A = pJpz = 


HP + 4)Q? - 3) 
(p — 5) (p — 4) 


Eliminating p between f3 l and /3„ we find after some reductions : 


(viii.). 


(ix.V 


(x.). 


A (A + 3) 2 = 4 (2/3 a - 3/3, - 6) (4& - 3/3,) 
or > K s, = 1.(xi.). 

Clearly, since this is the condition for Type V., that transition curve is none other 
than the curve obtained by making the denominator of the right-hand side of the 
differential equation have equal roots. The curve is clearly of considerable interest, 
and its existence had not been noticed in the previous series of frequency curves. 

The manner of fitting it is now easily described. 

Equation (ix.) gives us a quadratic to find p — 4 : 

(p_ 4 y»-^(p —4)-^-=0. (xi;.). 

The positive root of this is the required solution. 
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y is then found from the first of equations (viii.), or if a be the standard deviation 
= v //x 2? then 

y = CT (p — 2). v\p — 3) . . ..(xiii.). 

Then (vi.) gives : 

(xiv.), 


V 0 


ay 




r (p -1) 


which determines ?/ 0 , the remaining constant for the shape of the curve. 

For the position of the curve, we have for the distance from origin to mean, from 
the first of equations (vii.) : 

/V = v/(p — 2) = o\/(p - 3) ..(xv.). 

If d be the distance from mode to mean we have : 


Further, the skewness: 


d = /V - y/p 



P-2 p p(p- 2) 


Sk. = d/v = 


VO - 3) 

P 


Thus the solution is completed. 


(xvl.). 


(xvii.). 


(4.) 0/i the Frequency Curve of Type VI. 

Type VI., as we have seen, corresponds to the case in which Type I. of the memoir 
has its e negative. Hence either or to/ is negative and the curve transferring the 
origin takes the form 

y — y$ (x — a) m fx" h .(xviii.). 

Now it is possible that this curve falls under the limited range type of a frequency 
from x — 0 to x = a, but as we see that the criterion places Type VI. between two 
curves of range limited in one direction only, we expect Type VI. also to be of that 
character, and a complete solution is obtained by taking the range from x — a to 
x = oo ; this indeed fills up the gap for k 2 > 1 and < oo , and (xviii.) with this range 
is seen to pass into one or other of the two transition curves 

V — y 0 x p e~ r \ 

or y = y 0 x-?G-y ; *, 

according as we allow the first or second factor to approach a limit, f 

* The sign of ys will determine the sign of y, or, what may be taken as the same thing, the direction 
of the axis of x. 

f Write: y = const, x x~ m 2 (1 ~ xja) m ^ and make m-i ~ 00 , and a ~ 00 but mi/a finite. 

Or, y = const, x (1 - ajx) m i/x m - 2 ~ m i 0 and make a = 0, m x « cc, and a x m x together with rtu - m x 
finite. 
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Accordingly we shall write Type VI. in the form 

• ' y = y 0 (x-ay*/x «>.(xix.), 

and take the range from a to oo . 

Differentiating to find the position of the mode we have 


X n - 


__ a/ h 
ft ~ ft 


(xx.). 


For the moments about the origin : 

, f® x n (x—a)i* 
“/* n = y 0 - 

* a 

Put ajx = z, hence 


05*1 


dx. 


Hence we deduce 


V. = f ‘ **-*—• (1 -z)‘dz 

= B (?i - * - n - l . h +') 

_ yp r (ft -q s -n — 1) f(ft + 1) 
aQ r (q l — n) 


Vo r (ft - ft - 1) T (ft + 1) 




r(ft) 


/u = 
/**' = 
Ms' = 
/*4 = 


«(ft - jO 

ft - ft - 2 

a 2 (ft ~ D(ft - 2) _ 

(ft — ft — 2) (ft — ft — 3 ) 

a 8 (ft ~1) (ft-2) (ft-3) 

(ft - ft ~ 2 ) (ft - ft “ 3) (ft - ft -- 4) 

_ (ft ~ D (ft ~ 2) (ft - 3) (ft - 4) _ 

(ft “ ft - 2) (ft - ft - 3) (ft - ft - 4) (ft - ft - 5) 


(xxi.). 


(xxii.). 


Now if we compare these results with those on p. 368 of the earlier memoir we see 
that the one set can be at once deduced from the other by writing m 1 = — g A , m 2 = q. K 
Thus with this interchange the whole of that solution holds, if we bear in mind that 
the range is now from x — a to oo . 

We easily find : 

r = “ 2i + <h + 2 e—.l - <?i + ? 3 - 

VOL. CXCVII.—A. 3 M 
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and 1 — q v and q 2 4- 1 are the roots of 


z 2 ~ rz -j- e = 0. 

where r and e are to be determined as in that memoir, pp. 368-369. 
We have : 

ffi) <1 + g 3 ) 

r 2 (r f I) 


P'2 


(xxiii.), 


(xxiv.), 


where 1 — q x and r are both negative. This gives a* Thus q } ~q 9j and a are known, 
and from Equation (xxi.) 


_ a a«i - - 1 F 

y ° " rfe-IT-Wfe+'i) 


(xxv.) 


we find the remaining unknown constant for the shape of the curve, y Q . As before, 
various approximations may be used to the values of the F functions when either q x 
or q 2 or both are large, t 

We easily obtain for the distance between mode and mean 


and for the skewness : 


d = 


<<h + gii) 

(<h - ?*)(?! ~ % ~ 2 ) 


( g i + ( h) \/( 'h - g a ~ j ) .... 
(?1 “ g 2 )\/{(g1 “ 1)(? 8 + 1)} 


(xxvi.), 


(xxvii.). 


(5.) A special case of some interest arises when the start of the curve is a priori 
known. Suppose its distance from the mean to be c and let (using moments about 
centroid) 

— y 2 > pj&lhp) = 7?, • • ■ • • • • (xxviii,), 

Then we easily find : 

_ 1 ~ gi _ . gi + f h _ 

72 (! + ?a) (~ gi + c h + 3) ’ 73 (1 + g 3 )(gi - ? 8 - 4)' 


* 1 - ji being negative, e is negative, and accordingly by what goes before k 2 lies between 1 and oo . 
f The value of y 0 for curves of Type I., if mi be/small but large ( £ Phil. Trans./ A, vol. 186, p. 369, 
foot-note), is 


Vo 


b 


- (nil + Wh + 


!) <\/ 


mi + mi Ta / 1 

m, 


+ m. 2 


n ' 2/ r(m 1 +T) ’ 


and this can be easily modified to suit (xxv.) above. A very convenient and exact formula for V (n + 1), 
if n be large, is that given by Forsyth (‘B.A. Report,’ 1883, p. 47): 


the error being less than 


240# 


T(n + 1) = 
of the whole. 


Jri 2 + n + i\n + 2 
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"Whence we deduce to determine q x and q % : 


( h - ( h = 


1 — 3y 3 + 4y 3 

72-73 


?1 + % — 


7;i(l + 73)(72 ~ 1 ~ 2y s ) 

( 2 73 - 73 + 7373X72-73) 


and the solution proceeds as before. 


451 


(xxix.), 


(6.) Illustrations .—I propose to note a few distributions of frequency in which I 
have come across Types Y. and VI. 


(A.) Statistics oj Age of Bride at Marriage , the Bridegroom's Age being between 

24 and 25 years.* 

The observations given in the table, p. 454, are taken from Perozzo’s memoir : 
“Nuove Applicazioni del Calcolo delle Probability . . . ,” ‘ Keale Accademia dei 
Lincei,’ Anno CCLXXIX., 1881-2, Tavola I. 

The total number of recorded marriages is 28,454. The moments were calculated 
by using Sheppard’s corrections (‘ London Math. Soc. Proc.,’ vol. 29, p. 369), and are 
as follows :— 

Mean age of bride = 224 877. 

— 13-3346 
= 67*8145 
= 1224-6342 

Whence: = 1-9396 

/3 3 = 6-8873 
k x = 1-9558 
k. 2 = 1-1094 

Thus by p. 445 we see that Type VI. is the frequency curve to be selected, but as 
k 2 does not differ widely from unity, we shall probably get a good fit from Type V. 
as well. 

Taking Type VI. first, we find : 

r = - 12-11075, e = _ 317-84987. 

The quadratic (xxiii.) is accordingly : 

z 3 + 12-11075 z - 317-84987 = 0. 

* I selected this example at random, as one out of several leading to the curve types it was my 
object to illustrate. There is so much tampering with statistics, however, whenever they refer to the ages 
of women, that it would probably have been better to have used the men. 

3 M 2 
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Thus: ft = 25-88401, ' ft = 1177326. 

Hence by (xxiv.) a — 8"268,405, 

and by (xxv.) log y 0 = 24 - 275,3032. 

We have accordingly for the equation to the curve: 

(« - 8-268,405) UTr33 « 


V 


* X 1-884,965 


The distance from the origin to the mean is given by the first equation of (xxii.): 

fii = 16-98913, 


or, the theoretical range starts with brides of 5-198,570 + 8"268,405 = 13"466,975 
years. This is an excellent underlimit to the age of women marrying men of 24 to 
25 in a country like Italy. Our first group is at 15"5, and the above start is just two 
base units before this initial group. 

The skewness = *498,953, and the distance from mode to mean = 1"822,004, or 
the mode is at 20-3657 years. 

Turning now to Type Y. Ave have the following results :— 

16/ft = 8-249,262. 

Hence Equation (xii.) is : 

(p — 4) 2 - 8-249,262 (p — 4) — 8-249,262 = 0. 

Thus the positive value of p is : 

p = 13-150,747. 

Equation (xiii.) gives : 

y = 12973081. 

Then (xiv.) gives : 

log y 0 = 22-367,6952. 

Thus the equation to the curve is : 

y = 10 s * X 2-331,821 aT 13 ' 1 ™ 

To find the position of its start we have by (xv.) : 

p,/ = 11*6343, 

or, since the mean age of brides is 22-1877, the youngest possible theoretical bride is 
10-5534 years. This is probably a worse determination of the underlimit than in the 
case of Type YI. At the same time I notice that out of about 180,000 women, 101 
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were married between 14 and 15 years of age, and all the curves begin with a 
sensibly finite ordinate at 14'5 ; it is accordingly possible that a somewhat lower age 
than 13’5 actually occurs in Italy. 

Equation (xvi.) gives us for the distance from mode to mean : 

d = 17694, 

or the modal age at marriage is 20'4183 years. This is only about ‘053 of a year or 
about 19 days different from the modal age as given by Type VI., a most satisfactory 
agreement. 

For the skewness we have from Equation (xvii.) : 

Sk. = *4845, 

or, it differs by less than 3 per cent, from the skewness as given by Type (VI.). 

The diagram (fig. i.) shows the two curves, and the table compares the results 
obtained from either with the observations/^ 



It is clear that for all practical purposes the curve of Type V. is as good as that 
of Type VI. Indeed, there is practically no difference between them except for the 

* The observation data are really areas, while to save lengthy calculations we have compared both in 
diagram and table the ordinates of the theoretical curves. This is in general legitimate, if, as in this 
case, the number of groups is very large. 
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ages 15 to 17. The fit is, however, not a very good one, and although it is 
indefinitely better than a normal curve, and we see why in the absence of these 
types the statistics could not be fitted with any of the first series of skew curves, 
yet we are compelled to consider that there are causes other than chance at work 
very definitely affecting the frequency of the recorded ages. Thus the bridegrooms 
being 24 to 25, the desire of the bride to be recorded as younger than her husband 
probably fully accounts for the hulk of the preponderance of observation over theory 


Table of Observed and Calculated Frequencies. 


Age. 

Observed 

frequency. 

Calculated frequency. 

Age. 

Observed 

frequency. 

Calculated frequency. 

Type V. 

Type VI. 

Type V. 

Type VI. 

15-16 

367 

70 

49 

30-31 

256 

281 

282 

16-17 

717 

! 514 

489 

31-32 

164 

201 

198 

17-18 

1294 

1538 

1560 

32-33 

134 

148 

146 

18-19 

2121 

2751 

2800 

33-34 

94 

104 

105 

19-20 

3156 

3591 

3622 

34-35 

77 

75 

76 

20-21 

4009 

3830 

3831 

35-36 

68 

55 

55 

21-22 

3593 

3577 

3560 

36-37 

59 

40 

40 

22-23 

3604 

3055 

3034 

37-38 

33 

29 

29 

23-24 

3060 

2456 

2439 

38-39 

40 

21 

- 22 

24-25 

1774 

1894 

1884 

39-40 

27 

16 

16 | 

25-26 

1353 

1419 

1415 

40-41 

18 

12 

i2 , 

26-27 

936 

1044 

1043 

41-42 

21 

9 

9 i 

27-28 

663 

758 

760 

42-43 

11 

7 

7 

28-29 

468 

546 

549 

43-44 

14 

5 

5 

29-30 

319 

392 

395 

44-45 

1 

i 

4 

4 

4 


in the frequency of the brides of 22 to 24. The defect of brides between 17 and 
20 may be again due to the tendency to state the age as over 21, and so free 
the woman from the need for parental sanction. # These causes, giving a false 
displacement of age frequency, are probably in themselves sufficient to account for 
the theoretical defect in brides of 15 to 17. 


(7.) (B.) On the Variation in the Number of Lips of the Medusa P. Pentata. 

My data are the following, taken from a paper by Alfred Goldsborough Mayer : 
“The Variations of a Newly Arisen Species of Medusa,” ‘Science Bulletin of the 
Museum of the Brooklyn Institute,’ vol. 1, p. 1, 1901. 

* I have found in England the statement of the bride’s age in the marriage licence is for the same 
reason occasionally not in accordance with the year of birth as shown by the parish register. 
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Frequency. 

2 

5 

18 

123 

798 

49 

1 


No. of lips 
1 
2 

3 

4 

5 

6 
7 


455 


Total 996 


Mr. Mayer (p. 12) notes the failure of my curve of Type TV. I find for the 
constants : 

Mean = 4'8685 lips. 

fi. 2 = *309,006, «r = -55588 
/* 3 = - -350,697 
=1-181,718 
ft = 4-16834 
& = 12-37598 
k, = 6-24694 
k. 2 = 1-06594 

Since k 2 is so nearly unity we may use Type V. 

Hence I find : 

p = 8-66184 y = — 8-811634 

(y must be negative since is negative) 

p/ = 1-32270. 

Thus the curve starts at 6-19118 lips, or the one medusa with seven lips is 
theoretically excluded. Here I have worked with the uncorrected moments 
because the lips are discontinuous variants. Working with Sheppard’s corrective 
terms the limit is about six lips, and with the corrective terms suggested in my 
memoir on skew variation the limit is 7‘6 5. Further we have : 

log y 0 = 6"829,3633, 
distance from mean to mode = "30541, 

Sk. = -54941. 

The mode is thus at 5-17389, in good agreement with observation. 

The equation to the curve is, taking x positive from 6*19118 lips towards lesser 
values: 

3-896 8435 

log y = 6-829,3633 - S'66184 logs- . 
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This curve was drawn on a large scale and its areas read off with an integrator. 
The following theoretical frequencies were obtained : 


No. of lips. 

Observation. 

Calculation. 

6 and over 

50 

47 

5 

798 

762 

4 

123 

160-5 

3 

1.8 

20 

2 

5 

5 

1 . 

2 

1'5 


There would not be any serious divergence here, were it not for the group with 
four lips, which observation shows to he much under-represented. But it must be 
remembered that we have only seven groups, and that such a number is very 
insufficient for a good determination of the moments of a curve. Further, the 
variation is not really continuous, as indicated by the curve, but discrete. We have 
at present no clear statement as to how the moments of a discrete system of variation 
should he modified or corrected so as to give the best results for the moments of the 
continuous curve which is to theoretically represent the series. I am doubtful 



whether Sheppard’s corrections—the best for continuous variation—are equally 
appropriate in this case. Above I have used merely the rough moments, hut I 
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have found by considerable experience that in the case of discrete variables, to treat 
the system as a polygon and correct, as in my memoir on Skew Variation (‘ Phil. 
Trans.,’ A, vol. 186, p. 350), appears to give the best results when the areas are 
compared with the discrete groups. The point wants further investigation; when 
we have a large number of groups it is of little importance, but it makes a consider¬ 
able difference in these excessively skew distributions of discrete variables when the 
number of groups are small. # 

Above all, the diagram (fig. ii.) shows how all important it is to compare areas and 
not merely the ordinates of the frequency curve with the blocks representing the 
discrete frequencies in such a case as this. The wide-spread custom among foreign 
investigators of comparing merely the ordinates of the theoretical frequency curve 
with the observed frequencies leads in such cases to most fallacious results. 


(8.) (C.) On the Distribution of Incidence of Scarlet Fever Cases with Age. 

It seems desirable to give an illustration of the method of dealing with a distri¬ 
bution which falls under the class dealt with in Section (5) of this paper. Dr. 
Macdoxell, in dealing with the intensity of incidence of different diseases at various 
ages, has come across in scarlet fever a good illustration of curves of the types now 
under consideration. The whole of the arithmetical work on the present example is 
due to him, and I have to thank him very heartily for allowing me to use it here. 

The statistics are taken from the ‘ Report of the Metropolitan Asylums Board ’ 
(Statistical Part, 1899). They involve 39,253 male cases, distributed as follows :— 


Year of life. 

Frequency. 

Year of life. 

Frequency. 

Under 1 

443 

20-25 

926 

1-2 

1456 

25-30 

420 

2-3 

2631 

30-35 

215 

3-4 

3599 

35-40 

91 

4-5 

3862 

40-45 

45 

5-10 

15791 

45-50 

26 

10-15 

7359 

50-55 

17 

15-20 

2366 

55-60 

5 



60-65 

1 


The data being grouped partly in one and partly in five-year periods the moments 
had to he calculated with caution, separating the material into two pieces. Taking 
five years as the unit, Dr. Macdonell found for the uncorrected moments : 

* E.g ., petals of buttercups, teeth on the carapace of prawns, lips of medusae, as compared with veins on 
chestnut leaves, florets on ox-eyed daisy, &e. 
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Mean age of incidence, 8‘60975 years. 
g ;} = 1-369,345 
Ijl. a = 3*233,194 
/xj, = 19*143,575. 

The moments were not modified by Sheppard’s corrections, for these suppose contact 
of a high order at both terminals of the curve, and it was quite apparent that the 
curve must rise at a finite angle on the birth side. The following additional constants 
were then determined.: — 

/3 l = 4-07 1,222, /3 a = J 0-209.333, 

/q = 2-205,000, q= 2*813,783. 

Thus is >1 and <oo and the distribution is of Type VI. Now let us suppose the 

incidence of scarlet fever to start with birth, although there might, as in the case of 

enteric fever, be really some antenatal cases.* 

Turning to Section (5) we have : 

c —■ distance from birth to mean — 8-60975 years = 1*72195 units. 

Hence we deduce 

y.y = ‘461,819, 

And so from (xxix.) 

<h — <7, = — 10‘532,485, 
or, q } = 12-974,883, 

Then from 

c = /q' — a = a (? a +1 )/(q 1 - <h - 2) 

we find 

a ~ 4‘268,104, 

and, finally, after determining y Q from (xxi.), 

log ?/ 0 = 13-652,5078. 

Thus the values of the frequency are given by 

log y = 13-652,5078 + 2/442,398 log (*-4-268,104) - 12-974,883 log x. 

The origin of the curve is thus 4 - 268,l04 before birth. The mode is given by 

* mo = «?i,/(S 'i —?a) = 5-257,842. 

Thus : x mo — a — -989,738 = 4‘94869 yrs. 

* See ‘ Phil. Trans.,’ A, vol. 186 , p. 390 . The remarkably sharp rise of the scarlet-fever distribution 
as compared with the enteric is, however, much against this. 


y 8 = -685,596. 

<h + <h = 15-417,281 ; 
gr a = 2-442,398. 
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This gives for y moae the value 3892. 

Distance between mode and mean — 3'6610G yrs. 

Whence we find for skewness the value 

Sk. - -5347. 

The diagram (fig. iii.) shows that the fit may be considered a good one. 



(9.) lire conclusions of this paper are, I think, of some interest from the general 
standpoint of scientific investigation. A certain number of frequency distributions 
had been found, not only by my co-workers and myself here, but by biologists in 
America, not to fit into the general system of skew distributions dealt with by me in 
my original memoir. The first conclusion was that however wide-reaching that 
system appeared to be, it was a failure for a few rema.rka.bly skew distributions. But 
on more careful investigation of the differential equation it appeared that two types 
of solution had been left out of consideration, and that these were precisely those 
needed in the recorded cases of failure. 

I owe some apology to authors like Professor Davenport and Dr. Duncker, who 
have recently issued text-books on the application of statistical methods to biological 
variation, because although we have known and used these curves for some years past, 
no account has hitherto been published of them, and, consequently, biological 
investigators* using their resumes of my methods have been, and I fear still may be, 
occasionally puzzled. 

* E.g., Mr. A. G. Mayer in the paper on Medusas referred to above 

3 N 2 
































































